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Abstract 

This report studies local asymptotics of P-splines with pth degree B-splines and a 
mth order difference penalty. Earlier work with p and m restricted is extended to the 
general case. Asymptotically, penalized splines are kernel estimators with equivalent 
kernels depending on m, but not on p. A central limit theorem provides simple expres- 
sions for the asymptotic mean and variance. Provided it is fast enough, the divergence 
rate of the number of knots does not affect the asymptotic distribution. The optimal 
convergence rate of the penalty parameter is given. 
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1 Introduction 



Suppose there is a univariate regression model 

Hi = fj,(xi) + €i, i = l,...,n, 

where p,{xi) and a 2 (xi) are the conditional expectation and variance of yi given Xi, respec- 
tively. For simplicity, we assume Xi G [0, 1]. 

The regression function fi(x) can be modeled by YH=i®kBk(x) where c = K + p and 
B(x) = {Bi(x), . . . , B c (x)} T is a B-spline basis of degree p with knots = kq < K\ < • • • < 
kk = 1- P-splines (Eilers and Marx, 1996) find = . . . , 8 C ) T that minimizes 

n ( c -\ 2 c 2 

+ A * E { A "W} . A >°' ( L1 ) 

i=l t fc=l J fc=m+l 

where A is the difference operator, i.e., A(9k) = Ok — 6k-i and A m = A(A m_1 ), and A* is the 
smoothing parameter. Minimizing (11.11) gives 

(B T B/M + AD T D) = B T y, (1.2) 

where M = n/K, X = X*K/n, y = (y 1 , . . . , y n ) T , B = {B(xi) T , . . . , B(x n ) T } T is an n x c 
matrix, and D is the mth order differencing matrix of dimension (c — m) x c. For simplicity 
of notation, let 

A = B T B/M + AD T D (1.3) 
which is the smoother matrix for P-splines. Then the estimate is given by 

jl(x) = B T (x)0 = B T (x)A- 1 B T y/M. (1.4) 

For simplicity, we assume X\ = l/(2n),X2 = 3/(2n), . . . ,x n = (2n — l)/(2n), i.e., the 
response is observed at equally spaced design points. We also assume M is an integer to 
simplify some proofs. This assumption is for simplicity only and could be avoided. The case 
when the fixed design points are not equally spaced is considered in Section [HI 



2 



2 Review of Theoretical Study 

Penalized splines have been popular in recent years, as penalized splines use fewer knots, 
thus need less computation than smoothing splines. Ruppert et al. (2003) treat penalized 
splines extensively and also give numerous applications. 

However, the theory of penalized splines has been remaining an interesting but challenging 
problem. Opsomer and Hall (2005) first studied the asymptotic theory of penalized splines 
when K, the number of knots, is infinite. Li and Ruppert (2008) derived the first asymptotic 
distribution with low degree of splines and with low order of penalty. Wang et al. (2009) 
related penalized splines with some ordinary differential equations (ODEs), and by studying 
Green's functions associated with those ODEs, they were able to derive the asymptotic 
distribution of penalized splines. 

In contrast to Li and Ruppert (2008), Kauermann et al. (2009) considered the situation 
when K increases at a moderate rate. Though they did not obtain an explicit expression for 
the asymptotic bias and variance, they generalized their results for non-normal responses. 
Claeskens et al. (2009) showed that depending on whether K — > oo increasing at a sufficiently 
fast or a sufficiently slow rate, the asymptotic distribution of penalized splines is either close to 
that of a smoothing spline or a regression spline. Correspondingly, they referred to these two 
cases as either a large or small K scenario. The large K scenario is closest to current practice, 
as discussed, for example, in O'Sullivan (1986), Eilers and Marx (1996), and Ruppert et al. 
(2003), a relatively large number of knots is used and overfitting is controlled by a careful 
choice of smoothing parameter. 

One general approach to the theory of penalized splines is to use an equivalent kernel 
method, which was first used by Silverman (1984) for studying the asymptotics of smoothing 
splines. The equivalent kernel method was also useful in studying the asymptotics of P-splines 
(Li and Ruppert, 2008; Wang et al, 2009). 

Independent from Wang et al. (2009), we extend Li and Ruppert's (2008) results and 
provide an explicit expression on the asymptotic distribution of P-splines at an interior point. 
We also derive the asymptotic distribution of P-splines near the boundary, acknowledging the 
existence of Wang et al (2009). The conjecture, that provided it is fast enough, the divergence 
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rate of the number of knots does not affect the asymptotic distribution of penalized splines, 
is confirmed in this paper. 

The remainder of this chapter is organized as follows. In Section |3l we summarize our 
main results. In Section HJ we provide a general introduction of our method and present 
some technical results. In Section [5j We prove the main results in Section [3j In Section [6j 
we consider irregularly spaced data. In Section [TJ we give an example illustrating the idea 
of binning data for irregularly space data. In Section [HJ we conclude this chapter with some 
discussion. 



In this section, we summarize the main results. All derivations and proofs are given in 
Sections H] and |5j For notational convenience, a ~ b implies a/b converges to 1. We use the 
big "0" and small "o" notation that is with respect to n. Throughout this chapter, a = 0(b) 
means \a/b\ converges to some finite nonnegative number as n goes to infinity and a = o(b) 
mean \a/b\ converges to 0. We also denote by ^ k \x) the kth. derivative of the function n(x). 
We need the following definition. 

Definition 3.1. We define a kernel function 



where ipi, ■ ■ ■ ,ip m are the m complex roots of x 2m + (— l) m = such that all ip u (l < v < m) 
have positive real parts. 

A kernel estimator with the kernel H m is of the form (n/i n ) _1 'Y^ i yiH m {h~ 1 (x—x,j)}, where 
h n is the bandwidth. As shown in Lemma 19.131 H m is of order 2m which determines the 
convergence rate the corresponding kernel estimator. Proposition 13 . 1 1 shows that the P-spline 
estimator at an interior point is asymptotically equivalent to the above kernel estimator. 

Proposition 3.1. Assume the following conditions are satisfied. 

1. There exists a constant S > such that supj E (|yi| 2+5 ) < oo. 



3 Main Results 
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2. The regression function fi(x) has a continuous 2mth order derivative. 

3. The variance function a 2 (x) is continuous. 

4- The random errors 6j, 1 < i < n, are mutually independent. 

5. The covariates satisfy Xi — (i — 1/2) /n, 1 < i < n. 

Let ip = min{i?e(^i), . . . , Re(ip m )}, where Re(-) gives the real part of a complex number. Let 
h n = A 1 ^ 2 " 1 -* /K. Assume h n = o(l) and (Khn)' 1 = o(l). Let ft(x) be the P-spline estimator 
using rath order difference penalty and p degree B-splines with equally spaced knots. Fix 
x e (0, 1). Let ii*(x) = (nh n y l Y Ji yiH m {h~ l (x - Xi)}. Then 

E{fi(x) - fi*(x)} = O {(Kh n )- 2 } , 

var{/}(x) — n*(x)} = o {(n/i„,) -1 } . 

Theorem 3.1. Use the same notation in Proposition \3.1\ and assume all conditions and 
assumptions there are satisfied. Suppose that K ~ Cn T with r > (m + l)/(4m + 1), h n ~ 
/ l?2 - 1 /( 4m + 1 ) j or positive constants C and h and A ~ (Kh n ) 2m . For any x E (0, 1), we have 
that 

n 2 m /(im+l) _ ^ N {-^ 

in distribution as n — >• oo ; where 

j2( x ) = (-l) m+1 h 2m ^ 2m \x), (3.1) 
V{x) = a 2 (x) J H 2 m {u)Au. (3.2) 

Remark 3.1. Stone (1980) gave the optimal rates of convergence for nonparametric estima- 
tors. For a univariate smooth function fi(x) with a continuous 2mth derivative, the corre- 
sponding optimal rate of convergence for estimating /i(x) at any interior point is n - 2m /{ im + l ) _ 
Hence the P-spline estimator achieves the optimal rate of convergence. 

Theorem 3.2. Assume conditions (1), (3), (4) and (5) in Proposition l3~l\ hold. Assume n(x) 
has a continuous mth derivative over [0, 1] . Suppose that K ~ Cn T with r > (m+1)/ (2m+l), 
h n ~ /m _1// ( 2m+1 ) for positive constants C and h and A ~ (Kh n ) 2m . Let fi(x) be the penalized 
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estimator with mth order difference penalty and p > 1 degree B-splines with equally spaced 
knots. Assume x ~ c x h n where c x is a constant. Then we have that 



Here Hb,m is defined in (15.111) . 

Remark 3.2. Theorems \3.1\ and \3.2\ show that the P-spline smoother has a slower rate of 
convergence at the boundary than in the interior. 

4 Preliminary Derivation 

We consider the large K scenario (Claeskens et at, 2009) and assume K and the smoothing 
parameter A increase with n at certain rates specified later, respectively. 

The matrix A in (11. 3p is a symmetric and banded matrix. For q < k < c — q with 
q = max(p, m), the kth column of A (denoted by A^) is 



n 



rn 



/( 2m+1 ) {/*(*) - //(*)} =► N {fl (x), V (x)} 



in distribution as n — )■ oo ; where 




(0, . . . , 0, U)g, . . . , Wi, Uq, Wi, . . . , U q , 0, . . . , 0) T 



with the kth element being uq. We need the following equation 



U q + 0J q ^iP H h Cdip 9 1 + UJqP' 



& = 0. 



(4.1) 



Equation (14.11) has a compact form 



A(-l) m (l - p ) 2m p"~ m + p q - p P(p) = 0, 



(4.2) 



where 



P(x) = U p + Up-iX + ■ ■ ■ + UqX P + U\X P+l + ■ ■ ■ + u p x 



:2p 



(4.3) 



with the kth column of B T B being 



(0, . . . , 0, u p , . . . , Mi, m , Ui, . . . , u p , 0, . . . , 0) T . 



(4.4) 
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Let {p u , v — 1, . . . , q} be the q roots of (14.21) such that when A is large, the real parts of the 
first m roots are all positive and less or equal than 1 and moreover if p > m, the other q — m 
roots converge to zero. Define 

S k = Y, a » T k(Pv), (4.5) 

u=l 

where 

T k (p) = {p k -\... ,p,l, />,••• ,p c ~ k ) T . (4.6) 

For 1 < v < q and 2q < k < c — 2q, it can be shown that T{(p u ) is orthogonal to all columns 
of A except the first q columns, the last q columns and the jth column with \k — j\ < q. The 
coefficient vector a = (ai, . . . , a q ) T can be chosen so that S k is orthogonal to all columns of 
A except the kth column, the first q columns and the last q columns. It shall be shown later 
in this section that a does not depend on k. Specifically, we find a unique a such that 

S^A fc = 1 and Sj^Aj =0, < \k - j\ < q - 1, (4.7) 

where A k is the kth column of A as before. 

Fix x G (0,1). By (11. 4ft . we need only to consider non-zero Bk(x). Hence we assume 
k G (Kx—p — 1, Kx + p + 1). By (14. 7p and the definition of S k , there exists a constant C > 
such that, 

SlAj =0[exp{-C\- 1/{2m) Kmm(x,l-x)}], 1 < j < q, andc - q < j < c. (4.8) 

Let be a vector of length c with the kth entry 1 and other elements 0. Define 9 k = (S^A)6. 
Equation (D implies 9 k = S^B T y. By (|Q) . (Oil and LemmaEH h~k = (S£A-e£)0 = 
Yui=i bi,kUi, where b^k = O [exp {— C\~ 1 ^ 2m ^Kmm(x, 1 — x)}\ ■ Let Sk, r be the kth element 
ofS fc . By (TOD, 



p(x) = B k (x)SlB T y + B k (x)(9 k - 9 k 



k=l k=l 



£ 

k=l 



B k (x) I ^ S k,r ^ B r( X i)Ui 



r=l i=l 



\k-Kx\<p \i=l 



J2vi { J2 Bk^)B r (^)Sk,r + h{x) \ , (4.9) 



8=1 v. k,r 
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where bi(x) = Y^\k-Kx\<pBk{%)bi,k = O [exp {— CA _1 ^ 2m )A'min(x, 1 — x)}]. We assume ap- 
propriate regularity conditions on the data y so that interchanging sums in (14. 9 j) is valid. Note 
that J2k r Bk{.%)B r (xi)Sk,r + bi(x) in (14. 9[) is the weight of the ith observation for estimating 
p(x). 

For the boundary case, assume x goes to at a rate of A 1/(2m) i.e., x ~ CxX 1 ^ 2 " 1 ^ /K, 
where c x is a constant. We assume that X 1 '^ 2m ^/K converges to 0. Assume k E (Kx — p — 
1, Kx +p + 1), then is orthogonal to all columns of A except the fcth, the first q and the 
last q columns. Furthermore, Ti(p) defined in (14. 6p can be shown orthogonal to all columns 
of A except the first q and the last q columns. Define R^ = Y11=i Qfc,i/Ti (/?„). Then Sfc + R^ is 
orthogonal to all columns of A except the fcth, the first q and the last q columns for arbitrary 
coefficient vector = {a>k,i, ■ ■ ■ , cik,q\ T ■ We find the coefficient vector so that + is 
orthogonal to all columns of A except the kth and the last q columns. Specifically, we find 
a such that 

(S fc + R fe ) T A fe = l and (S* + R k f A, = 0, < j < c - q. (4.10) 

Then there exists a constant Co > such that for c — q < j < c, (S& + Rfc) T Aj = 
O [exp {-C \~ 1/{2m) K}]. We can derive that, similar to ( Ojl . 

K X ) = X/^M ^2 B k(x)B r (Xi)(S k , r + Rk,r) + &i,o(z) > , (4.11) 
i=l L k,r ) 

where Rk, r is the rth element of R^ with R k>r = J2l=i ^k,vPu~ 1 i an d h^x) = O [exp {— CoA -1 ^ 2 " 1 ^}] . 
In the next subsections, we shall derive the coefficients p u , a u and a ktU . 

4.1 Derivation of p v 
4.1.1 The case p < m 

In this case q = m. Equation (14. 2 1) becomes 

A(-l) m (l - p) 2m + p m - p P{p) = (4.12) 

and pi, . . . , p m are the m complex roots of (14.12p such that the real part of p v is positive and 
less or equal than 1. Proposition 14.11 below shows that p v exists and has an explicit form. 
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Proposition 4.1. As X oo, the roots of equation take the following forms 

Pu = l- ij u \' 1/(2m) + l/24> 2 v \- 1/m + O { A~ 3/(2m) } , 1 < v < 2m, (4.13) 
where ipi, ■ ■ ■ ,ijj 2m are the roots of x 2m + (— l) m = 0. 

Remark 4.1. To be consistent with the definition in Section^ we assume for the first m 
roots, ip v have positive real parts and for the last m roots, ip u have negative real parts. The 
real parts of pi, . . . , p m are hence positive and equal or less than 1. 



Proof of Proposition \4 . 1\ The existence of 2m roots for equation (14. 12ft is obvious from 



complex analysis. Suppose 1 — Si is a root of equation (I4.12p . Then 

= A(-l) m 5 2m + (i _ s x ) m -pp(l - Si) = 0. 

Because the leading coefficient for the polynomial Gi r \(5i) is A(— l) m (or A(— l) m + ujq if 
m = p), it is easy to see that Si is uniformly bounded as A — > oo. Hence (1 — Si) m ~ p P(l — 
Si) is uniformly bounded, which implies A(— l) m Sf m is uniformly bounded. It follows that 
lim^oo Si — 0. Then 

lim Gi, A (<?i) = lim A(-l) m 5 2m + 1 = 0, 

A— >oo ' A— >oo 

which implies 

Si = i)A~ l/{2m \l + S 2 ), (4.14) 

where ip u is a root of x 2m + (— l) m = for some v and liniA^oo Si = 0. Substituting (14.141) 
into Gi t \ (denoted by G2,a(<^2)) gives 

= G 2>X (S 2 ) = -(1 + S 2 ) 2m + {1 - VvA~ 1/(2m) (l + S 2 )} m ~ P P {1 - ^\- 1/{2m) (l + S 2 )}. 

(4.15) 

It is easy to show that 

{1 - iM-^Cl + 5 2 )} m ' P =1 - (m-p)^X~ l/{2m) + o {A~ 1/(2m) } , (4.16) 
P {1 - ^A- 1/(2m) (l + S 2 )} =P(1) - P'(l)^A- 1/(2m) + o {A- 1/(2m) } . (4.17) 

Equalities (I4.15l) -( 14.17l) . as well as Lemma 1931 imply 

S 2 = ^"^V a- 1 ^! + Ss) = -\^\-^ 2m \l + S 3 ), 
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where lirriA->oo ^3 = 0. By similar analysis, we can show that Ss = O {A 3 /( 2m )j. Hence a 
root of equation (14.121) takes the form 

1 - ^A" 1/(2m) + l/2^A~ 1/m + 0{A~ 3/(2m) }, for some v. 

Thus, equation (I4.12p has 2m roots that take the above form and each root has a ip u that is 
a root of ( EES} - 

4.1.2 The case p > m 

When p > m, equation (14. 2 p becomes 

A(-l) m (l -p) 2m ff- m + P{p) = 0. (4.18) 
Similar to Proposition 14.11 we have the following 

Proposition 4.2. As A — > oo, 2m roots of equation fl^.lffp take the forms in UTWf) , and 

additionally, p — m roots of equation (f^.iffp take the following forms 

Pu = {y}~^ + 0(A-^), m + l<u<p, (4.19) 

where if) m +ii ' ' ' >"0p are ^ e r cots of x v ~ m + (— l) m = 0. 

Proof of Proposition Assume Sq is a root of equation (j4.19p . Consider the case 
lim supa^^ S 7^ and is bounded. Then a similar proof as that of Proposition 14.11 gives 
2m roots taking the forms in (14.131) . Now consider the case limsupA^^^o = 0. P(So) 
converges to u q as A — > oo, which implies A(— l) m <5o~ m converges to — u q . It follows that 

= ipviuq/X) 1 m '(l + Si), where ip v is a root of x p ~ m + (— l) m = for some v and 
liniA->oo 5i = 0. Similar derivation as in the proof of Proposition 14. II gives (I4.19p . To complete 
the proof, notice that for the case limsupA^oo S = oo, we can derive the rest p—m unbounded 
roots of equation (14.181) . 

4.2 Derivation of a v 

In this subsection, we shall establish the following 



10 



Proposition 4.3. Assume q < k < c — q and x € (0,1). As X oo, the vector a satisfying 
the constraints in ff. 7| ) is unique, i.e., does not depend on k, and has the following form 



a v = - — A~ 1/(2m) {1 + 0(A- 1/m )| , 1 < v < m, 
2m 1 ' 



(4.20) 



and if p > m, 



a u = 0{X p/{m - p) ) , u = m + l, 



■P- 



Remark 4.2. Because the proof is lengthy, we shall sketch the proof within the context in 
the remainder of this subsection. 

For 1 < v < q, define Sj(p v ) = Tl(p u )A^ q+j for 1 < j < q. Then sj(p u ) = YlCo u q -i(pl~ l - 
PiT" 7 )- Constraints in (14 .7p give a system of linear equations 

/ si(pi) ••• si(p q ) \ ( a x \ ( 0\ 



Sg-l(pi) 



\ 



a q -i 
a„ 





V 1 / 



As shall be shown soon, a^'s exist and are unique. Making use of the structure of Sj(p u ) and 
doing row transforms on the above linear equations, we have 

/ w ff (pi - pi 1 ) ■ ■ ■ u q (p q - p q x ) \ ( ai \ ( \ 



u q (p\ 1 



-p\- q ) 
Pi 9 ) 



^(Pf 1 



p\- q ) 



CLq-l 

(In 



J 



■■■ ^M-p^ 9 ) l 

Further row transforms on the above equations give 

/ i • i \ / ^^-pr 1 





V 1 / 



( Pl + pr 1 - 2)^- 2 
y ( Pl + pr 1 - 2)*- 1 



(Pq + Pq 1 



\ / o \ 



a q -i(p q -i - p q \) 
V a i(Pg - Pq 1 ) J 





-1 



V ^ ) 



2 y-2 
( Pq + p- 1 - 2)«- 1 ) 

In the above equations, the matrix before the column of coefficients is a q x q Vandermonde 
matrix. Making use of the determinant property of Vandermonde matrix, the solution to the 
above linear equations exists and is unique because p v + p„ l — 2,1 < v < q are all different. 
Furthermore, it is apparent that the solution to the above equations does not depend on k, 
hence a is the same for all k such that q < k < c — q. By Cramer's rule in solving linear 
equations, we obtain for 1 < v < q 
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a u u q (p u - l/p v ) = - J-y^ ^ 

ih<i<j<q(Pj + Pj -Pi -Pi ) 

(_l)9+"(_l)?-i' 



Hi<jjtv<q(p» + pv i - pj- pj x ) 

1 



Tll<j^v< q (P» + Pv l ~ Pj~ Pj ) 

Hence 



{pv -Pu 1 ) n + pu i - pj- pj i )- 

i<j^v<q 



4.2.1 The case p < m 
By fl4TT3|) . for 1 < u < m, 

p v - p- 1 = -2Va~ 1/2m + 0(A- 3/2m ), 

and 

p v + p- 1 - 2 = tfX' 1 /™ + 0{\~ 2 ' m ). 
It follows that for 1 < j ^ v < m, 

Pu + p- 1 - Pj - Pj 1 = M - ^)x~ l/m + o(\- 2 ' m )- 

Then 

\[{ Pv + p; 1 - pj - pj 1 ) =A- 1+1/m Um- ^) + o{\-^)} 

j+v j+v 

=A- 1+1 /™|jJ(^-^) + 0(A- 1/m ) 
By Lemma [9.61 equality (I4.24p can be simplified 

no^ + - pi - pi 1 ) = (-ir +i m^ 2 \- i+i/m {i + o(A _i/m )}. 

In light of fTC2TD and (Q5]) . 

{a^fo, - p; 1 )}- 1 = (-lJ^m-^A 1 - 1 ^! + 0(A _1/m )}. 
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Note that for p < m, u q = u m = (— l) m A+ a constant, where the constant is the coefficient 
of p m in the polynomial P(p). Hence (— l) m A _1 Co>q = 1 + 0(A _1 ). It follows that 

a 'v X =u q {pv - 1/pu) Y[{pv + l/pu - Pj - l/Pj) 

= -u q {2^A- 1/(2m) + 0(A- 3/(2m) )} (-l) m+1 m^ 2 A- 1+1/m {l + 0(\- 1/m )} 
=2m(-l) m X~ 1+1/( - 2m) u q ^ 1 {1 + 0(\- 1/m )} 
=2mA 1/(2m V; 1 {1 + 0(\- l/m )} . 
The above derivation establishes (14.201) . 

4.2.2 The case p > m 

To derive a v , we need to study ( 14.22p again. For the term p v + p^ 1 — pj — pj 1 in ( 14.221) . there 
are two new cases besides (14.231) . 

i_ _ _i = {-^ l (\/0J q ) l ^-™) +O (l), is<m< 3l 
Pv Pv Pj Pj {(^//^(t-'-^l + Ofl), u>m,j>m. 

It is easy to show when v > m, a u is of order \P/( m ~P> and when 1 < v < m, (14.201) is still 
valid. Notice that in this case u q is a constant that only depends on p. So now we have 
finished the proof of Proposition 14.31 

4.3 Derivation of a^ v 

In this subsection, we shall derive the form of a kv satisfying the constraints in (I4.10p . Instead 
of giving a proposition, we derive the form of 5^ in the context. 

Consider the fc's satisfying k e (Kx — p — 1, Kx + p + 1). Since x goes to at a rate of 
A 1 ^ 2 " 1 ) /K, k > (p + m). Hence {Sfc + Rfc(x)} T = 1 is automatically satisfied for arbitrary 
a&. Denote P = D T D and Pfc the kth column of P. Note that every row of B T B/M sums 
to 1, hence 



{S k + R k (x)} T (Aj - XPj) = OiX- 1 ^} + O ( max \a k>u \] , 
In light of the constraints in (14. 10ft . 

{S fc + RkWfPj = O {A _1_1/(2m) } + \- x O ( max |a fe ,„| ) , 

\l<v<q ) 
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j = l,...,q. 



j = l,...,q. 



For simplicity, denote O {A 1 1 / ( ~ 2m ^ + A 1 (maxi<^< g \dk,v\) by £. Further simplification 
shows that the above is equivalent to 

- p-'r+^a^- 1 + - p^-^ = 0(0, j = 1, . . . , ™, (4.26) 

and if p > m, 

t( 1 -p; 1 ) 2 >; (rm " 1) «^ 1 +t( 1 -^) 2 v; , " 1 «^ = 0(0. j = m+i 1 ..., q . (4.27) 
i/=i y=i 

4.3.1 The case p < m 

Because k G (Kx — p — l,Kx + p + 1), /c/lcrA 1 ^ 2 '™)} — > 1. Hence for 1 < v < m, p^ 1 — > 
exp(—c x ip u ). Since q — m, all p^'s take the forms in ( 14.13)) . As A — > oo, p v — >• 1, (1 — p^p — > 
Tpi\-i^ 2m \ (l-p-y -> (-l) j ^A- J '/( 2m ) and a y ^^A^ 1 ^ 2 ™). It is easy to show the lead- 
ing term of E^i^-P; 1 )™^ VpjT 1 is (2m)- 1 \-^ m+ ^' {2m ^ J2T=i(- 1 ) m+j ~ exp(-c x ip„) 
and the leading term of £™ =1 (1 - p v ) m+i ~ x ~a Kv is A"^'- 1 )/^) £™ =1 ^™ +J '- 1 a fc , 1/ . Therefore, 
we derive that 

S fc „ = ^A" 1/(2m) + 0(A- 1/m ), 1 < i/ < m, (4.28) 
2m 

for some constant bk yU - Because of (14.281) . £ = 0{\~ 1 ~ 1 / ( - 2m ^}. Matching the coefficients of 
x -( m+j )/(2m) for the j th term in (|^26|) gives 

exp(-c,^) + = (4.29) 

v=\ u=l 

To simplify notation, we define ^ m ,i is an m x m matrix with its (i, j)th element ^J^* -1 , \I/ mi 2 
is an m x m matrix with its (i,j)th element (— l) m+J '0J l+l and r(x) = (e~^ lX , . . . , e~^ mX ) T . 
By TO, 

(&M> • • • » ^,m) T = *i* m , 2 r(c :E ). (4.30) 

4.3.2 The case p > m 

Note that if z/ > m, p„ = OjA- 1 /^-" 1 )} and a u = 0{\~ p ^ p - m ^}. Equality gSTJ for j = m + 1 
reduces to 

m m q 

( _ ir+ i A -i-i/( 2m )^^ exp( _ c ^ ) + ( _ 1)m +i A -i^^ + a k , v = 0(£), 
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i.e., 

q m 

<v = a-^-i)^ 1 Yl + °(o = ( 4 - 31 ) 

v=m+l v=\ 

Because of (I4.3ip . the analysis in the previous subsection is also valid and (14.301) still holds. 

Furthermore, we can derive from (14.271) that 

<? 

~a k ,»pL = O {X- 1 - 1 ^} , j = 0,...,q-m-l. (4.32) 

z/=m+l 

It follows from fl4~32]) that 
i 

yj CLk,uPi = O jA -1-1 ^ 2 ™-* } , for any non-negative integer j. (4.33) 

z/=m+l 

5 Derivation of Asymptotics 

In this section, we shall prove the main results in Section [3j Specifically, we shall derive the 
asymptotic distribution of P-splines when x G (0, 1) and when x goes to at certain rate. 
Define x k = (k - 1/2)/ K. 

5.1 The Case x G (0, 1) 

To prove Proposition 13. 1[ we need Proposition 15.11 below. 

Proposition 5.1. Let h n = A 1 ^ 2 " 1 )/-^- Let ij) Q = min{Re(ipi), . . . , Re(ip m )}, where Re(-) 
gives the real part of a complex number. Assume h n = o(l) and (Khn)^ 1 = o(l). For 

xe(o,i), 

nh n 22 B k (x)B r (xi)S ktr /M 



k:r 

'O ( \- 2+ ^) + Sn x ^ x A <(3D+2 - m )/K\0 (\-^ + ^ 



~-H„, ( ^-^ | 



+ exp —0o 



{p>m} 



{\x-Xi\<(3p+2-m)/K} 

hn j [O (A- 1/m ) + S^yS^^^^yO {A- 1/(2m) }_ 
Here 5{ p>m y = 1 if p > m and otherwise; the other 5 terms are similarly defined. 
Proof of Proposition [3771 By the definition of in (14. 5p . 

J2 B k (x)B r (x t )S k , r /M = | J2 B k (x)B r ( Xl )a u pl k - rl /M 1 . 

k,r v=X L k,r ) 
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(5.1) 



If p > m and v > m, p v = 0{X~ 1 ^ p ~ m ^} by Proposition 14.21 and a u is of order \~p/(p- m ) by 
Proposition 14.31 Note that if \x — Xi\ > (3p + 2 — m)/K, a necessary condition for a nonzero 
Bk(x)B r (xi) is that \k — r\ > p — m, hence, for v > m, 

k,r (5.2) 

=h\*-*i\<zp+i-rn)/K}0 { \^^Kn x } + 0(X~ 2 Kn~ 1 ). 
In the above derivation, Lemma 19.21 was used. Fix 1 < v < m. Define 

b u = -X 1/{2m) log{ Pu ), l<u<m. 

Then by (OS]) . 

b v = $ v + (\- 1/m ) , 1 < v < m. 

It follows that 

p[ k ^ = exp (-^^pT) = exp (-^J^^j |l + \^_fA (A~ 1/m ) } . 
By the expression of in (I4.20p . 

- dk exp (-*^) I 1 + + ^) (x ^} • 

In light of Lemma 19. 7\ 

2mnh n \ J2B k {x) B r{xi)a»P l u~ rl /M 

V k,r 

= Y,B k {x)B r { Xi )^e W f-^J^^T) U+U + ^j^) O (A" 1 /-) 

k,r " 



^exp(-^^^) |l-^(x,x i ) + 0(A- 1 /™) |>. (5.3) 



Summing (15. 3 p for z/ = 1 , . . . , m gives 

{m 
J2J2 B ^ X ) B r( X i) a -Pl k ~ rl / M 
u=l k,r , 



^ (^) + exp ("*^) ° (A " /m) " Xi)Q (V) ' (5 ' 4) 
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where 

Q( x ) = — 5^^exp(-^|x|) . 

It is easy to show that < exp(— ^ |#|)- Lemma IDT51 states that g(x,Xi) = if 

(p + l)/K. Lemma [9.121 states when m > 1, Xa<i/< m ^ = 0- Thus if x is close to and 
m > 1, Xa<i/<m exp(— ^|^|) is of the same order as x. Hence, 



g(x,Xi)Q 



=${\x-xi\<(p+i)/(Kh n )} exp -ip 



[O {{Kh n )- 2 } + 5 {m=1} {{KKY 1 }] 



(5.5) 



Equalities ( 15. 2ft - ( 15751) together prove Proposition 15.11 

Proof of Proposition \3.1[ By (14.91) and Proposition 15.11 



fi{x) 



nh. 
where 

Ti(x) =exp ( -^o 
+ 5 



1=1 v 7 y 2—1 



C>(a -J +*{m=l}^| a ._ a ..|< ( p +1)A -i/(2 m ,}0 (A aLJ 



(p>m) 



O A + J^^kojh- 



2-m)/A"} ( 



I f I 1 



(5.6) 



+ 
First we have 



n/i n exp{— CA 2m K min(x, 1 — x)} 



\E{ji(x)-iS(x)}\ < {nKY^H 



(5.7) 



We study the right hand side of (15. 7p . For r,i(x) defined in (15.61) . the two terms 0{\ 2+1 /( 2m ) | 
and O [nh n exp{— CA _1 ^ 2m ^min(x, 1 — a;)}] are of order o(\~ 1 /' m ). Also 



(nh n ) 1 ^ \fi(xj)\exjp ( -ip 

i ^ 

(nh n y l ^ \K x i)\ ex P ( -^o 



= 0(1), 



8{\x-X i \<(p+l)\- 1 /( 2m ')} — O ( A 2m 



(n/i n ) 1 |A*(a;i)|5{|x-x i |<(3p+2-m)/A:} = 0{(Kh n ) 1 }. 

i 

It follows that J2i \K x i) r i( x )\ — 0(A _1 / m ). Next we derive that 

var{/t(x) — fJ*(x)} = {nh n )~ 2 r 2 [x)a A 

i 
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(5.1 



With similar derivation as before, we can establish that (nh n ) 1 J2i r i( x ) <j2 ( x i) = 
Therefore the proposition is proved. 

Example 5.1. Consider the case m — 2. Denote the imaginary number by %. Then tpi = 
d ip2 = Hence the equivalent kernel for x G (0, 1) is 

1 \x-i\ r /Ix — x|\ — 5| 

=e V2 < cos =— + sin 



an 



Example 5.2. Consider the case m — 3. T/ien -0i — l,ip 2 — 1+ g , ^3 = 1_ 2 ■ Hence the 
equivalent kernel for x G (0, 1) is 

1 _| T =i 1 J / \/3|a; — x| \ r- . / \/3|x-x| 
-e [x xl + -e 2 < cos 1 + V3sm 1 

6 6 IV/ V 

Proof of Theorem \3.1[ Proposition 13.11 shows that the P-spline estimator is asymptotically 
equivalent to a kernel regression estimator with the kernel function H m (x). Hence a standard 
analysis of the kernel regression estimator as in Wand and Jones (1995) with the kernel 
function H m {x) should give us the desired result. The detailed derivation is as follows. First, 

em*)} = »(x) + (-ir +i /4V 2m) (*) + o(h 2 n m ) 

and 

var {( /M } = £^,)^*(^) 

H 2 m (s)ds + o{{nh n )- 1 }. 



nh 



1 o-Hx) 



By Proposition 13. 1[ we obtain 

E{£(*)} = p(x) + (-l) m+1 ^V 2m )(x) + o(hl m ) + OUnhJ- 1 }, 
1 f°° 

var {p,(x}} = — — o 2 (x) / H^(s)ds + o{{nh n )~ 1 }, 
nn n J— 00 

and the proof is straightforward by verifying that h^ 71 and (nhn)" 1 are of the same order and 
\~ 1/m = o{h 2 n m ). 
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5.2 The Boundary Case 

By (14.1 ip and the derivation in Section I4.3[ we have 



1 n 

i=i 



^2 B k (x)B r (xi) {S k>r + R k ,r( X )} + b i 



k,r 



Tfj^.yi \ B k{x)B r {xi)S k , r + b ij0 (x) I 

i=l L k,r ) 



(5.9) 
(5.10) 



+ m 5^ Vi \ B k{x)B r (xi)R ktr (x) 

i=l \ k,r 

Note that bifi(x) = 0[exp{ — CoX^ 1 ^ 2 " 1 ^ K}]. The sum in (15.91) can be similarly analyzed as 
in Section 15.11 and we have 



J{ ^2vi \ ^2B k (x)B r (xi)S k , r + b i>0 (x) 



8=1 k,r 



1 - 

nh n ^ 



H„ 



+ exp -if) 



0{{Kh n y 1 } 



Now we focus on the second sum (denoted by /tb(x)) in (I5.10p . Note that R k}r (x) 
ELi^vPr 1 - Note also tiv>m,p v = 0{\~ l ^~ m ^} and (1433]) holds. Hence, 



jl b (x) 



2mnh. 



1 n 



m c c 



ESS Brix^B^b^pl- 1 + 0{(Kh n )- 2 } 



u=l r=l k=l 



By a similar analysis as in Section 15.11 we obtain, aided by Lemma 19.91 that 



£i b (x) 



2mnh 



1 n 



i=l 



r T (^)*~\* m ,2r(c x ) + 0{(Kh n )- 2 } 



2mnh 



1 n 



i=i 



T/ X « \,Tr-l 



* ml * m , 2 r(-) + 0{(^ n )- 2 } 



Note that \£ m) i, ^ m ,2 and r(x) are defined in Section I4T31 In the above derivation, we used 
the assumption that x/h n converges to c x ; we also used (I4.30p . We define the equivalent 
kernel for fib(x) as 

1 



H bm (x,x) = r(x) * ml * m , 2 r(x). 
2m 



Now we have 



1 n 



i=l 



H r , 



X j 
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Kh r 



(5.11) 



. (5.12) 



The above equality shows that when x is near 0, a P-spline estimator is a kernel regression 
estimator with the equivalent kernel 



H m (\x -x\) + H b>m (x, x). 



(5.13) 



Next we provide two specific examples of (15.131) . 

Example 5.3. Consider the case m = 2. It can be shown that 

— l-H -l-i 



-I-!-' -l-i ) • ^m,2 



and 



\/2 



r(x) = e ^ 



-1 



-1 



COS 



+ ? sin 



V5 y 



Hence, 



H bj2 {x,x) = —e vi j cos I 



I \ / X \ ( X + X 

2 cos ( —= cos — = — sin — =- 
V2/ Vv/2/ V \/2 



follows that the equivalent kernel for x near is 

y/2 \x-x\ r f\x — x\\ f\x — x\ 

— — e ^ < cos =— + sin =— 

4 I V y/2 J V \/2 



-f- — e <( cos I | + 2 cos I —= I cos i — — i - sm , ^_ 

4 I V V2 y Vv^y \V2j V ^ 



m cos (A 



x + a; 



When x = 0, the equivalent kernel becomes 



V2e _ * /V5 cos (x/V2 



which coincides with the equivalent kernel for the smoothing splines (Silverman, 1984)- 



Example 5.4. Consider the case m = 3. can be shown that 
/I -1 -1 \ 

^m,l — I 1 5 5 I j V m,2 — 



2 2 



1 -1-y^t 

2 
2 

1 1 



1 



2 

2 
1 



and 



( 



rm 



e 2 <^ cos 
e~2 < cos 



2 

V3x 



i sm 



\Z3rr 



, . + i sin I ^ 



/ 
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It follows that the equivalent kernel for x near is 

+ a/3 sin 



1 , 1 J ( y/3\x-x\\ ft . ( \/3\x-x\ 

-<= 1 1 + -e 2 < cos 



6 



6 



+2 C -M*I + H c -l-^f I {cos [ ^1 - v^sin ^ 
6 6 I \ 2 



1 \x+n I / v3(5 — x) 
3 cos 



^ . i V3(a; - x) \ . ( V3x \ . ( V3x 
V 3 sin + 2 sm I — — | sin 



When x = 0, the equivalent kernel becomes 



e * + e *' /2 < c( >s 



x 



sm , 
3 \ 2 



Proof of Theorem \3.2c Similar to the proof of Theorem 13.11 we can derive that 
E{£(*)} 



H„ 



1 n 
i=i 

'-J- [ K u ) \ H m ( ^~ Ul 
K Jo IV h n 



\x — x i\ \ jj ( x ( 1p X \ O ( ^ 

h n J b,m \hnh n J 1 °h n J \Kh r , 



H b , m ( ~7~ i ~r~ ) \ du ~\~ O 



1 



fi(x - hv) {H m (v) + H b:Ttl (c x ,c x - v)}dv + O {{Kh n ) x } 



and 



var{/t(a;)} 
1 - 



<T 2 (a;; 



(n/i n ) 2 ^ 
l + o(l) 1 Z" 1 



H„ 



\x — %i\ \ _|_ fx X A_|_ /" / x i \ q f 1 

h„ ) ' m \h'h n ) \ h n J \ Kh r 



1 2 



;< 1 -cr 2 (x) / {# m (w) + H bt7n (c Xl c x - v)} 2 dv. 



nh r 



By Proposition 15.21 below, we have 

E{/2(x)} = //(a;) + (-l) m+1 /i> (m) (x) T W m {H m (v) + H b , m {c X) c x - v)} dv 

J — oo 

+ °{K +1 ) + o {{kky 1 } . 

Combining (15. 14ft with ( 15. 151) . Theorem 13.21 is proved. 



(5.14) 



(5.15) 
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Proposition 5.2. For any fixed constant t > 0, 



/ x e {H rn (x) + H b:m (t,t - x)}dx = 0, 

J — oo 



l,...,m- 1, 



and 



I x m {H m (x) + H btm (t } t-x)}dx^0. 

J —oo 



Proof of Proposition \5.2i By Lemma I9.10[ we can show that 



x'H m (x)dx = ~Y,Y, 7JIT^W*" le "*"' 



2m 
and 



^+1 yi-jfc+l ^ ^ 



x r 



Because H^ m {t,t — x) = (2m) 1 r(t — x) T ^ r m 1 1 ^ , mj2 r(t), it suffices to prove that 

(^-\ . . . , t^" 1 ) + (-l) fc . . . , ^* ) *i* m , 2 = T , fc = 1, . . . , m. (5.16) 

Let wl = (-l) m+1 . . . , x/j^j *~^* TO>2 . Then w fc is the (m + l-/c)th row of * m , 2 . Hence, 
for A; = 1, . . . , m, 

which proves (15.161) . For i = m, we have 

/"* — m! 

/ x rn {H m (x) + H b>m (t, t-x)}dx = — — w m+1 r(t), 
J-oc 2m 

where w^ +1 = • • • , «) + (-l) m+1 ^T + \ * m !i* m , 2 . Note that •••,«) 

(_l)m+i . . . is the first row of * TO>1 , hence 



~ T 



W 



m+1 



(^•••,C) + (-i) m+1 (C-.tft 



= 2(-ir +i (^r,---,o 

which finishes the proof. 

22 



6 Irregularly Spaced Data 

Suppose the design points x = {xi, . . . , x n } are independent and sampled from a distribution 
F(x) in [0,1]. Suppose F(x) is twice continuously differentiable with derivative f(x) and 
f(x) is positive over [0,1]. For unequally spaced design points, the asymptotic analysis in 
Section [5] does not hold here. Instead of pursuing the challenging task of analyzing the P- 
splines fitted to irregularly spaced data directly, we first bin the data. So we partition [0, 1] 
into / intervals with equal lengths, and let yt be the mean of all yi such that Xi is in the kth 
bin. If the kth bin has no data point, we let be 0. Here we assume / ~ cjn TI for some 
constants cj and tj < 1. Assuming y^ is the data point at x^, the center of the kth bin, we 
apply P-splines to the binned data {jjk)i<k<i to get 

= A _1 B T y/M. 

Then the penalized estimate is defined as 

c 

A(x) = X)%S fc (x). (6.1) 

k=l 

Note that the practice of binning data in penalized splines also appears in Wang and Shen 
(2010). The asymptotic distribution of fi(x) in (16.11) can be similarly derived as in Section [51 

Theorem 6.1. Let a 2 (x) = vai(y\X = x). Assume tj > max(r, 1/2) and condition (l)-(4) 
in Proposition \3. 1\ hold. Furthermore, assume o~ 2 (x) has a continuous second derivative. For 
x G (0, 1), with the same notation and assumptions as in Theorem VJ.li we have that 

n2m /(4m+i) =>N{p,(x),V(x)/f(x)} 
in distribution as n — > oo, where fi(x) is defined in Ii3. 1\) and V(x) is defined in Ii3. 2\) . 

Remark 6.1. The above theorem holds for the fixed design as well and the assumption 
required for the design points is an analogue to \6.J^ : sup fc \nk/{nl~ l ) — f(x K )\ = o(l). 

Proof of Theorem \6.1l By a similar analysis as in Section [5] to the binned data y and 
with n replaced by /, we obtain 

/*(*) = ^ I> {n m (^^) + n(x) } , 
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where 



r k (x) =exp I -^ \^ j [O (\^ m ) + S {m=1} 6 { ^ k ^ 1)x . 1/l2m)} { A" 1 ^) } 



+ 5 



(p>m) 



O (A 2+ 2mj + 5{| a; _ % | < (3p + 2- m )/if}0 |A + 



Then 



and 



+ O [J/i n exp{-CA- 1/(2m) Kmin(x, 1 - x)}] . 

E{/i(z)|x} = {Ih n )- 1 J2 E {Vkk} \H m ( | ai r: 



var |x} = (Ih n ) 2 ^ var {Vk\x} \ H, 

k 

For simplicity, we let 



m i — : I +r k (x) 



(6.2) 



(6.3) 



G k = H m {h n l (x - x k )} + 6fe(a;). 
Let n k be the number of data points in the /cth bin, then 

n 

var {y fc |x} = n k 2 ^2o 2 {xi)5 { \ Xi _~ Xk \< {2I) -i } . 



i=i 



So var {^/^y k \x\ is a Nadaraya- Watson kernel regression estimator of the conditional vari- 
ance function a 2 (x) at x k . Similarly, n k / (nl^ 1 ) is a kernel density estimator of f(x) at x k . By 
the uniform convergence theory for kernel density estimators and Nadaraya- Wat son kernel 
regression estimators (see, for instance, Hansen (2008)), 



and 



sup\n k /(nl x ) - f(x K )\ = O p ^fUnnJn + / 2 | = o p (l), 
sup | var {v/n^/fc|x} - (J 2 (x fe )| = O p |a// hm/n + J~ 2 j = o p (l). 



(6.4) 



It follows that 



Then by and fl63D . 



sup 

k 



n 



a 2 (x K 



var{y k \x} 
I f{x K 



op(i; 



(6.5) 



var 



cr 2 (x 



nh n Ih n ^ f(x K ) 



° p(1) ^G^o,!^)- 1 } 



nh n Ih n 
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and hence 

var {/2(x)|x} = — !— -77-y + Op^nKY 1 } . (6.6) 
where V(x) is defined in (I3.2p . Because 

n 

E {?/fc|x} = n fc 1 K x i)%i-5M<(2/)-i}, 
we can derive by (16.41) that 

sup |E{^ fc |x} -v(x K )\ = Opir 1 ). 

k 

Hence by fl6T2|) . 

= o P (/- 1 ), 

and hence 

E |x} = ^i(x) + n-^/^ m+l ^fi(x) + o p ( n -(2^)/(4m+i) 1 ^ (6 
where ji(x) is defined in (I3.ip . With (16.61) and (16. 7p . we can derive that 

n (2m)/(4m+l) _ E {/}(z) |x}] iV {0, V(z)//(a;)} (6.8) 

in distribution and 

n (2m)/(4m+l) [ E {^( x )|x} - = + 0p {\). (6.9) 

Equalities (I6.8P and (16 .9p together prove the theorem. 

7 An Example 

We illustrate the idea of binning data using the LIDAR (light detection and ranging) data. 
The LIDAR data were analyzed in Hoist et al. (1996) and Ruppert et al. (1997). The LIDAR 
data have 221 data points, and details about the LIDAR data can also be found in Ruppert et 
al. (2003). We fit the response, logratio, as a function of the predictor, range. First, we fit the 
data using cubic P-splines with a penalty of second order, and we use 35 equidistant knots as 
suggested in Ruppert et al. (2003). Then, we fit the binned data using cubic P-splines with 
a penalty of second order. The number of bins is 60 and we use 15 equidistant knots. The 
result is given in Figure [Q We can see that the two fitted curves are similar, with biggest 
difference occurring when the predictor, range, is around 650. 
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E {fi(x) |x} - — KxjGk 
n k 




Figure 1: The fitted curves of the response, log ratio, as a function of the predictor, range. 
The solid line is the fitted P-splines without binning the data, and the dashed line is the 
fitted P-splines after binning the data. The solid dots are the observed data. 



8 Discussion 

We have concentrated on the asymptotics of penalized splines estimation. In contrast to 
smoothing splines, penalized splines allow us to choose the number of knots, the degree of 
splines and the penalty independently. Our study provides theoretical guidelines on how to 
choose them. In our setting, the penalty A plays the role of a smoothing parameter and the 
optimal order for A is provided. The number of knots K is not important as long as it exceeds 
a given bound. The choice of the degree of splines does not affect the asymptotic distribution. 
Our results indicate that the performance of penalized splines estimation is similar to that 
of smoothing splines estimation (Silverman, 1984) and a class of kernel estimators (Messer 
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and Goldstein, 1993). Furthermore, penalized splines have a slower convergence rate at the 
boundary than in the interior. 



9 Some Lemmas 



Lemma 9.1. The coefficients 6 defined in U.2\) satisfies 9k = Yli^i^Ui with d iy k = o(l) 



1< k < c. 



Proof of Lemma \9. ft It suffices to show every element of the matrix (B T B + A*D T D)~ 1 B T 
is o(l). Because every column of B T contains at most p+l non-zero elements that sum to 1 by 
Lemma [9T2| it suffices to show that every element of the matrix AT -1 A -1 = (B T B+A*D T D) _1 
is o(l). Since A -1 is positive-definite, it suffices to show the diagonal elements of M~ 1 A~ 1 
are o(l). For 1 < % < c, the largest eigenvalue of M _1 A _1 is smaller than the largest 
eigenvalue of (B T B) _1 since D T D is positive semi-definite. By Lemma 2 in Zhou et al. (1998), 
the eigenvalues of (B T B) _1 are OiKjn). Hence the diagonal elements of M _1 A are all 
0{K/n) = o(l). 

Lemma 9.2. The B-splines satisfy J2k=i Bk(x) = 1 f or any % £ (0, 1). 

See page 201 in de Boor (1978). 

Lemma 9.3. The B-splines with degree at least 1 satisfy ^2 k ^ Bi t (x){Kx — k+(p+l)/2} = 
for any x G (0, 1). 

Proof of Lemma \9.3l By Lemma I9.2[ J2k=i (x){Kx — k + (p + 1) / 2} = is equivalent 



We shall prove (19.11) by induction on p. Assume p — 1. Let k x be the integer such that 
x G [k/K, (k + l)/K). Then B kx+1 (x) = -Kx + k + 1 and B kx+2 (x) = Kx-k. It follows 



to 



K+p 




(9.1) 



k=l 



that 



K+l 



B k {x)k ={-Kx + k x + l){k x + 1) + {Kx - k x )(k x + 2) 



k=l 



= {Kx - k x )(k x + 2-k x -l) + (k x + l) 



Kx + 1. 
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Assume now the degree of the B-splines is p. We use B^\x) to denote the B-splines is of 
degree p. We use the recursive relation of de Boor, 



b£\x) 



K 
p 
1 

V 



x 



k-p-l\ „[p- 



K 



{Kx- k + p+1) B^:l\x) - {Kx - k) B [ r 1] {x) 



It follows that 



(K+p 



• IP- 1 ], 



X 



k 



plY J Bt\x)k 

I k=l 

K+p 

- [(Kx -k + p + 1) Bt! ] (x) - {Kx - k) <-/ 

k=l 

K+p-1 K+p—1 

-- B [ ^ l 1] {x){Kx-k + p+l)k- B [ ^ 1] {x){Kx-k)k 

k=l k=l 
K+p-1 K+p-1 

-- J2 B [ ^- 1] {x){Kx-k + p){k + l)-- B [ v- 1] {x){Kx-k)k 

k=l 
K+p-1 

-- B [ ^ l \x){Kx-k + p + pk) 

k=l 

K+p-1 



V 



k=l 



(9.2) 



=Kx + p + {p - 1) B 1 ^ {x)k 

k=l 

= {Kx + p+ (p- l){Kx + p/2)} 
=p{Kx + {p+l)/2}, 
which is (19.11) . Therefore, Lemma [9.31 is proved. 

Lemma 9.4. Let M = n/K be an integer. Let {Bx{x), . . . , B c {x)} , where c = K + p, be 
the the B-splines basis with knots {—p/K, — {p — 1)/K,...,0/K,1/K,... ,K/K}. Then for 



k = q+l,...,K, 



Y J B k (x i ) = M 



i=l 



Proof of Lemma \9.4\ Proof by induction on p. Consider p = 0. B^ix) = 1 if x G 
[k/K, {k + 1)/K) and is otherwise. So for fixed k, Bf.{xj) = 1 if and only if (i — l/2)/n G 
[k/K, {k + l)/K), i.e., if and only if i = nk/K + \,nk/K + 1, . . . , n(k + 1)/K. Hence the 
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case p = is proved. Now consider p > 1. By the recursive relation of de Boor in ( 19. 2D . 



P P 

M(p+1) if /-A n b»-i]/ \ V"V , t/^dM 



p p 



i=l i=l 



^ V \ i=l r=l J 



M(P+1) + 1^ B , ,1 

p p ~^ 

M(p+1) 1 M 



p p 

=M. 

So Lemma 19.41 is proved. 
Lemma 9.5. P(l) = 1,P'(1) = p. 

Proof of Lemma \9. 51 The expression of P(x) in f 14 . 3 [) is rewritten here, 

P{x) = u p + u p ^ix + ■ ■ ■ + uqx p + uix p+1 + ■ • • + u p x 2p . 

Hence, -P(l) = 2 Yui=i u i + M o an d P'(l) = p(2 $^f =1 Ui + uq), so we only need to show that 
2 J2 P i=i Ui + u = 1. Let C = B T B/M. By ([P]) . if p < i < c - p, then the coefficient vec- 
tor (u p , Up-i, • • • , Mq, Mi, • • • , Up) T equals (C^j-p, Cj^-p+i, • • • , Ci,i, Cj^+i, • • • , CVj +p ) T ' . Thus, 
2 YI=i u i + u o = lL\i-j\<pCi,j = Y.jCi,j because C itj = if \i - j\ > p. Since C i%j = 
Z r B^Bjix^/M, 2 u *+ u o = ErW x r) Ej Bj(x r )}/M = Z r B t (x r ) / M = 1, where 
the last equality holds by Lemma 19.41 



Lemma 9.6. If {/0i, • • • , ip m } are the m roots of x 2m + (— l) m = satisfying that the real 
part of ip u is positive, then 

J]^' - 1#) = (-ir +1 mC 2 - (9.3) 
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Proof of Lemma \9. (k It is easy to see that {ipf, ■ ■ ■ , ip^} are the m roots of x m + (— l) m = 0. 
Thus, rijliX 2 ' - rfj) = ( — l) m - Taking derivative of njlil 3 - - rfj) with respect to x and letting 
x = tpl give (|23). 

Lemma 9.7. Suppose g(x) = exp(—b\x\) with b ^ 0. 



where 



^2B k (x)B r (xi)g(- 

k.r 



q{x^ Xi 



2 Sfc<r B k (x)B r (xi){r -k) if x > x h 



(9.4) 



Proof of Lemma \9. 1\ Suppose that x > X{. Take a Taylor expansion of g(x) at the point 



/in ' 



g{ ^kX L) = g(l^i) (l - A(| Xfc - Xr | - | X - X .|) + 0{(^ n )- 2 } 



' ' 1 1 1 - t^-(|A; - r| - Kx + Kxi) + 0{(Kh n )- 2 } } . 



h, 



Khr. 



Hence if we drop the term g( 9L j^ i )0{(Kh n ) 2 } in the above equality, 



y^ j B k (x)B r (x i )g( 



k.r 



hr, 



g( 



g( 



g( 



g( 



Hr. 



hr. 



l -)Y,B k (x)B r ( Xl ) jl - -^-{\k -r\-Kx + K Xi )\ 

k,r K n J 

^2B k (x)B r (xi)(\k - r\ - Kx + Kxi) > 

k,r ) 

J2 B k{x)B r (xi) (\k-r\-k + ^^ + Kxi 

k,r ^ 

J2 B k^)B r (xi)(\k-r\+r-k)[ 

k,r ) 



h n 



1 - 



Kh n 
b 

Kh n 
b 

Kh n 
2b 

Khn 



Y,B k {x)B r { Xi ){r-k) 



k<r 
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Note that in the above derivation, we used Lemma 19.21 and 19.31 The other case when x < xi 
can be similarly proved. 

Lemma 9.8. The function g defined in satisfies 

g(x,Xi) = if\x-Xi\>(p+l)/K. 

Proof of Lemma \9. 8t Suppose x > X{. When x — x« > (p + l)/K and k < r, either Bk(x) 
or B r (xi) will be 0. The other case can be similarly proved. 

Lemma 9.9. Suppose g(x) = exp(— b\x\) with b ^ 0. 

BAxMj^) = [i + OUKKY 1 }} g(^). 

Proof of Lemma \9.9i Take a Taylor expansion of g(x) at the point 
Hence if we drop the term g(j^)0{(Kh n )~ 1 } in the above equality, 

Lemma 9.10. Assume ip is a complex number and = 1. For any nonnegative integer i, 

c f±i p\ n .t-k+i 

/ xV^dx = -e~^ V — — ?b k 

J xe ax e 2^(i_ k + 1 yV> 

where ip is the conjugate ofip. 
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Proof of Lemma \9. 1 (k The results of indefinite integrals of J x e e ax cos(bx)dx and J x e e ax sin(bx)dx 
are given by results 3 and 4 on page 230 of Gradshteyn and Ryzhik (2007). 



Lemma 9.11. Assume 



1 with positive real part. For any nonnegative integer 

x £ e -^ x dx = e\?p e+1 , 



where ip is the conjugate of if). 

Proof of Lemma \9.11\ See Lemma 19.101 
Lemma 9.12. If £ is even and 2 < i < 2m — 2, 

m 
v=\ 



Proof of Lemma \9.12c Assume {z\, z%, . . . , Z2 m } are all the roots of the equation x 2m + 
(— l) m = 0. Since £ is even, we can show that Y^=i = 1/2 Y^=i z l because if a + hi is a 
root of x 2m + (— l) m = 0, then ±a ± bi are also roots. Assume m is odd first. Let cu = e m l m . 
Note that u is a primitive root of x 2m = 1, and we can organize {zi, . . . , Z2 m } in such a way 
that Zi = to 1 . It follows that 

2m 2m 



i , ,2ml 

UT = iA^-r- = 0. 



i=l 



1-00 1 



For the case m is even, let Uq = e tn ^ 2m \ We can also write Z; t = ^o +2 \ then 



2rn 



2m 



i=l 



8=1 



Lemma 9.13. 



x H m (x)dx 



£(l+2i) 





1 






1 — CO, 



Ami 



1 , W 
L—LU 



k (-l) m+1 (2m)! 



= 

is odd 

is even and 2 < £ < 2m — 2 
= 2m 
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Proof of Lemma V9.13\: Since H m (x) is symmetric about 0, the result for odd £ is obvious. 
Assume £ is even. By Lemma I9.11[ 



If £ = 0, H m (x)dx = { -^- YZLi ^ = 1 as desired - If £ = 2m > IZo x 2m H m (x)dx = 



(— l) m+1 (2m)! also as desired. The case when £ is even and 2 < £ < 2m — 2 is proved by 
Lemma 19.121 
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